Placement Offers Collection
This document provides comprehensive documentation for the PlacementOffers collection schema that stores extracted and structured placement offer data from emails. It explains the unique identifier system, field semantics, embedded documents, arrays, enumerations, validation rules, and operational flows. It also includes example documents illustrating different offer scenarios and student selection patterns.
The PlacementOffers collection is part of the MongoDB database used by the SuperSet Telegram Notification Bot. The schema and operational logic are defined across documentation and service modules:
Schema definition and indexes are documented in DATABASE.md
Data extraction and validation logic are implemented in placement_service.py
Persistence and merge logic are implemented in database_service.py
Example datasets are provided in placement_offers.json and placement_offers.json.backup
placement_service.py"] Extract --> Validate["Validation & Enhancement
placement_service.py"] Validate --> Save["Persistence Layer
database_service.py"] Save --> Collection["PlacementOffers Collection
MongoDB"] Collection --> Queries["Queries & Analytics
DATABASE.md"]
Diagram sources
Section sources
The PlacementOffers collection schema defines the structure for storing placement offer records extracted from emails. The key fields and their semantics are:
offer_id: Unique identifier for the offer (distinct from MongoDB’s ObjectId)
company: Company name associated with the offer
role: Primary role offered (schema variant)
package: Embedded document containing base, bonus, total, and currency fields (schema variant)
students_selected: Array of student records with name, email, branch, roll_number (schema variant)
offer_status: Enumeration with values ‘pending’, ‘confirmed’, ‘completed’ (schema variant)
source_email: Email address from which the offer was extracted (schema variant)
processing_status: Enumeration with values ‘new’, ‘processed’, ‘notified’ (schema variant)
notifications_sent: Nested object tracking channel-specific notification flags (schema variant)
extracted_at: Timestamp when the offer was extracted from the email (schema variant)
created_at: Timestamp when the record was first saved (schema variant)
updated_at: Timestamp when the record was last updated (schema variant)
Note: The repository contains two variants of the schema:
Current schema (used by the application) with fields like role, package, students_selected, offer_status, processing_status, notifications_sent, extracted_at, created_at, updated_at
Historical schema (presented in DATABASE.md) with fields like company, role, package, students_selected, offer_status, source_email, processing_status, notifications_sent, extracted_at, created_at, updated_at
The current schema aligns with the application’s implementation and the example datasets.
Section sources
The PlacementOffers collection participates in a data pipeline that extracts, validates, persists, and notifies about placement offers:
Diagram sources
Schema Fields and Semantics#
offer_id: Unique string identifier for the offer; separate from MongoDB’s ObjectId
company: Company name associated with the offer
role: Role offered (current schema variant)
package: Embedded document with base, bonus, total, and currency fields (current schema variant)
students_selected: Array of student records with name, email, branch, roll_number (current schema variant)
offer_status: Enumeration with values ‘pending’, ‘confirmed’, ‘completed’ (current schema variant)
source_email: Email address from which the offer was extracted (historical schema variant)
processing_status: Enumeration with values ‘new’, ‘processed’, ‘notified’ (current schema variant)
notifications_sent: Nested object with telegram and webpush boolean flags (current schema variant)
extracted_at: Timestamp when the offer was extracted from the email (current schema variant)
created_at: Timestamp when the record was first saved (current schema variant)
updated_at: Timestamp when the record was last updated (current schema variant)
Data Types and Constraints#
offer_id: String, unique index
company: String
role: String
package.base: Number
package.bonus: Number
package.total: Number
package.currency: String
students_selected: Array of embedded documents
offer_status: String enum
source_email: String
processing_status: String enum
notifications_sent.telegram: Boolean
notifications_sent.webpush: Boolean
extracted_at: Date
created_at: Date
updated_at: Date
Validation Rules#
Numeric fields (package.base, package.bonus, package.total) must be numeric
Arrays (students_selected) must be present and non-empty for valid offers
offer_status must be one of ‘pending’, ‘confirmed’, ‘completed’
processing_status must be one of ‘new’, ‘processed’, ‘notified’
offer_id must be unique
company must be present and meaningful (length >= 2)
Example Documents#
Below are example documents illustrating different offer scenarios and student selection patterns:
Single student selection with role and package
Multiple students selected for the same role
Multiple roles with different packages
Internship offer with stipend handling
Conditional/full-time offer with detailed package breakdown
These examples are derived from the repository’s example datasets and demonstrate typical structures encountered in practice.
Section sources
Processing Logic and Merge Behavior#
The database service implements merge logic for PlacementOffers:
Diagram sources
Key behaviors:
Merges roles by role name, updating package details when higher packages are found
Merges students by enrollment_number or name, prioritizing higher packages
Emits events for new offers and updates with newly added students
Skips offers without a company name
Section sources
Data Model Relationships#
The PlacementOffers collection schema can be represented as:
Diagram sources
The PlacementOffers collection depends on several components:
Extraction pipeline (placement_service.py) for generating structured offer data
Database service (database_service.py) for persistence and merge logic
Documentation (DATABASE.md) for schema definition and indexes
Example datasets (placement_offers.json, placement_offers.json.backup) for validation and testing
placement_service.py"] --> Schema["Schema Definition
DATABASE.md"] Schema --> DB["Database Service
database_service.py"] DB --> Collection["PlacementOffers Collection"] Examples["Example Datasets
placement_offers.json(.backup)"] --> Validation["Validation & Testing"] Validation --> DB
Diagram sources
Section sources
Indexes: Unique index on offer_id, single-field indexes on company and processing_status, and descending index on created_at
Query patterns: Sorting by created_at, filtering by processing_status, and aggregation for counting students per company
Batch operations: Using insert_many for bulk ingestion and update_many for bulk updates
Caching: Consider caching frequently accessed statistics and recent offers
Common issues and resolutions:
Duplicate offers: The merge logic updates existing documents and emits events for newly added students
Missing company name: Offers without a company are skipped during processing
Validation failures: Extraction pipeline returns validation errors; review extraction prompts and retry logic
Privacy concerns: Forwarded sender information is sanitized and not included in user-facing fields
Section sources
The PlacementOffers collection schema provides a robust foundation for storing and managing placement offer data extracted from emails. Its design supports efficient querying, maintains data integrity through validation and merge logic, and enables scalable notification workflows. The schema aligns with the application’s extraction and persistence layers while maintaining flexibility for diverse offer scenarios and student selection patterns.